AITopics | data factor

Collaborating Authors

data factor

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Revisiting Link Prediction: A Data Perspective

Mao, Haitao, Li, Juanhui, Shomer, Harry, Li, Bingheng, Fan, Wenqi, Ma, Yao, Zhao, Tong, Shah, Neil, Tang, Jiliang

arXiv.org Artificial IntelligenceFeb-6-2024

Link prediction, a fundamental task on graphs, has proven indispensable in various applications, e.g., friend recommendation, protein analysis, and drug interaction prediction. However, since datasets span a multitude of domains, they could have distinct underlying mechanisms of link formation. Evidence in existing literature underscores the absence of a universally best algorithm suitable for all datasets. In this paper, we endeavor to explore principles of link prediction across diverse datasets from a data-centric perspective. We recognize three fundamental factors critical to link prediction: local structural proximity, global structural proximity, and feature proximity. We then unearth relationships among those factors where (i) global structural proximity only shows effectiveness when local structural proximity is deficient. (ii) The incompatibility can be found between feature and structural proximity. Such incompatibility leads to GNNs for Link Prediction (GNN4LP) consistently underperforming on edges where the feature proximity factor dominates. Inspired by these new insights from a data perspective, we offer practical instruction for GNN4LP model design and guidelines for selecting appropriate benchmark datasets for more comprehensive evaluations.

dataset, link prediction, proximity, (16 more...)

arXiv.org Artificial Intelligence

2310.00793

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Michigan (0.04)
Europe > Germany > Bavaria > Lower Franconia > Würzburg (0.04)
(2 more...)

Genre: Research Report > New Finding (0.92)

Industry:

Information Technology > Services (0.47)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Data Factors for Better Compositional Generalization

Zhou, Xiang, Jiang, Yichen, Bansal, Mohit

arXiv.org Artificial IntelligenceNov-7-2023

Recent diagnostic datasets on compositional generalization, such as SCAN (Lake and Baroni, 2018) and COGS (Kim and Linzen, 2020), expose severe problems in models trained from scratch on these datasets. However, in contrast to this poor performance, state-of-the-art models trained on larger and more general datasets show better generalization ability. In this work, to reconcile this inconsistency, we conduct an empirical analysis by training Transformer models on a variety of training sets with different data factors, including dataset scale, pattern complexity, example difficulty, etc. First, we show that increased dataset complexity can lead to better generalization behavior on multiple different generalization challenges. To further understand this improvement, we show two axes of the benefit from more complex datasets: they provide more diverse examples so compositional understanding becomes more effective, and they also prevent ungeneralizable memorization of the examples due to reduced example repetition frequency. Finally, we explore how training examples of different difficulty levels influence generalization differently. On synthetic datasets, simple examples invoke stronger compositionality than hard examples do. On larger-scale real language datasets, while hard examples become more important potentially to ensure decent data coverage, a balanced mixture of simple and hard examples manages to induce the strongest generalizability. The code and data for this work are available at https://github.com/owenzx/data4comp

better compositional generalization, data factor

arXiv.org Artificial Intelligence

2311.0442

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.53)

Add feedback

Google's new AI tool could help decode the mysterious algorithms that decide everything ZDNet

#artificialintelligenceNov-21-2019, 15:46:45 GMT

While most people come across algorithms every day, not that many can claim that they really understand how AI actually works. A new tool unveiled by Google, however, hopes to help common humans grasp the complexities of machine learning. Dubbed "Explainable AI", the feature promises to do exactly what its name describes: to explain to users how and why a machine-learning model reaches its conclusions. To do so, the explanation tool will quantify how much each feature in the dataset contributed to the outcome of the algorithm. Each data factor will have a score reflecting how much it influenced the machine-learning model.

algorithm, google, mysterious algorithm, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.42)

Add feedback

Bayes Imbalance Impact Index: A Measure of Class Imbalanced Dataset for Classification Problem

Lu, Yang, Cheung, Yiu-ming, Tang, Yuan Yan

arXiv.org Machine LearningJan-29-2019

Recent studies have shown that imbalance ratio is not the only cause of the performance loss of a classifier in imbalanced data classification. In fact, other data factors, such as small disjuncts, noises and overlapping, also play the roles in tandem with imbalance ratio, which makes the problem difficult. Thus far, the empirical studies have demonstrated the relationship between the imbalance ratio and other data factors only. To the best of our knowledge, there is no any measurement about the extent of influence of class imbalance on the classification performance of imbalanced data. Further, it is also unknown for a dataset which data factor is actually the main barrier for classification. In this paper, we focus on Bayes optimal classifier and study the influence of class imbalance from a theoretical perspective. Accordingly, we propose an instance measure called Individual Bayes Imbalance Impact Index ($IBI^3$) and a data measure called Bayes Imbalance Impact Index ($BI^3$). $IBI^3$ and $BI^3$ reflect the extent of influence purely by the factor of imbalance in terms of each minority class sample and the whole dataset, respectively. Therefore, $IBI^3$ can be used as an instance complexity measure of imbalance and $BI^3$ is a criterion to show the degree of how imbalance deteriorates the classification. As a result, we can therefore use $BI^3$ to judge whether it is worth using imbalance recovery methods like sampling or cost-sensitive methods to recover the performance loss of a classifier. The experiments show that $IBI^3$ is highly consistent with the increase of prediction score made by the imbalance recovery methods and $BI^3$ is highly consistent with the improvement of F1 score made by the imbalance recovery methods on both synthetic and real benchmark datasets.

correlation, imbalance recovery method, minority class sample, (12 more...)

arXiv.org Machine Learning

1901.10173

Country:

North America > United States > New York (0.04)
Asia > Macao (0.04)
Asia > China > Hong Kong (0.04)
(2 more...)

Genre: Research Report > New Finding (0.74)

Add feedback